Skip to main content

Ordinary Least Squares

Linear models aim to predict a target value as a linear combination of input features. Commonly represented with:

y^=w0+w1x1++wpxp\hat{y} = w_0 + w_1 x_1 + \dots + w_p x_p

where y^\hat{y} is the predicted value, w0w_0 is the intercept, and w1,,wpw_1, \dots, w_p are the coefficients.

Ordinary Least Squares

  • Goal: Minimize the residual sum of squares between observed targets and predictions.
  • Method: Fits a linear model to minimize the cost function yXw2\|y - Xw\|^2.
  • Complexity: Computed using singular value decomposition; O(n2p)O (n^2p) if n>pn > p.
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
print(reg.coef_)

Non-Negative Least Squares

  • Constrain coefficients to be non-negative. Useful for variables representing quantities like prices.

Complexity of OLS

  • Computed using SVD, cost depends on matrix dimensions.

Ridge Regression

  • Goal: Address multicollinearity by adding a penalty on the size of coefficients.
  • Method: Minimizes yXw2+αw2\|y - Xw\|^2 + \alpha \|w\|^2 where α\alpha is a complexity parameter.
  • Solver Choice: Automatically chosen based on conditions (e.g., data sparsity).
from sklearn import linear_model
reg = linear_model.Ridge(alpha=.5)
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
print(reg.coef_)
print(reg.intercept_)

Ridge Classification

  • Treats binary classification by converting targets and applying the ridge regression formula.

Lasso

  • Goal: Estimate sparse coefficients, effectively reducing the number of features.
  • Method: Minimizes yXw2+αw1\|y - Xw\|^2 + \alpha \|w\|_1.
  • Use: Important in the field of compressed sensing.
  • Feature Selection: Can be used to select features due to sparsity of solution.
from sklearn import linear_model
reg = linear_model.Lasso(alpha=0.1)
reg.fit([[0, 0], [1, 1]], [0, 1])
print(reg.predict([[1, 1]]))

Multi-task Lasso

  • Goal: Estimate sparse coefficients for multiple regression problems jointly.
  • Specificity: Same features selected across all regression tasks.

Elastic-Net

  • Goal: Combine penalties of Ridge and Lasso, useful when multiple features are correlated.
  • Method: Minimizes yXw2+αρw1+α(1ρ)2w2\|y - Xw\|^2 + \alpha \rho \|w\|_1 + \frac{\alpha (1 - \rho)}{2} \|w\|^2.

Multi-task Elastic-Net

  • Goal: Similar to Multi-task Lasso but with Elastic-Net penalty.
  • Use: Applicable when tasks share the same sparse features.

Least Angle Regression (LARS)

  • Goal: Efficiently compute full path of coefficients.
  • Method: Similar to forward stepwise regression, adjusts direction to stay equiangular to all variables most correlated with residual.

Orthogonal Matching Pursuit (OMP)

  • Goal: Approximate the best fit under a constraint on the number of non-zero coefficients.
  • Method: Greedy algorithm selecting features most correlated with the residual.

Bayesian Regression

  • Goal: Incorporate regularization through prior distributions.
  • Types:
    • Bayesian Ridge: Regularization parameter estimated from data.
    • Automatic Relevance Determination (ARD): Similar to Bayesian Ridge but promotes sparsity.
from sklearn import linear_model
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
Y = [0., 1., 2., 3.]
reg = linear_model.BayesianRidge()
reg.fit(X, Y)
print(reg.predict([[1, 0.]]))
print(reg.coef_)

Logistic Regression

  • Goal: Linear model for classification (despite the name).
  • Method: Uses logistic function to model probability of default classes.
  • Regularization: L1, L2, and Elastic-Net available to penalize model complexity.

Generalized Linear Models (GLM)

  • Extension: Allows prediction using different distributions from the exponential family.
  • Functionality: Models relationship through a link function and extends the model family beyond normally distributed errors.

Quantile Regression

  • Goal: Estimate medians or other quantiles, rather than means.
  • Method: Minimizes the pinball loss, useful when predicting intervals.

Polynomial Regression

  • Goal: Extend linear models using polynomial basis functions.
  • Use: Can model non-linear relationships within a linear framework.

Polynomial Features Transformation

from sklearn.preprocessing import PolynomialFeatures
import numpy as np
X = np.arange(6).reshape(3, 2)
poly = PolynomialFeatures(degree=2)
print(poly.fit_transform(X))

Polynomial Regression Pipeline

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
import numpy as np
model = Pipeline([
('poly', PolynomialFeatures(degree=3)),
('linear', LinearRegression(fit_intercept=False))
])
x = np.arange(5)
y = 3 - 2 * x + x ** 2 - x ** 3
model.fit(x[:, np.newaxis], y)
print(model.named_steps['linear'].coef_)